<h2>DESCRIPTION</h2>

<em><b>r.fill.stats</b></em> is a module for fast gap filling and
interpolation (with smoothing) of dense raster data.

<p>

The <em>r.fill.stats</em> module is capable of quickly filling <b>small
data gaps</b> in large and high-resolution raster maps. It's primary purpose
is to improve high-resolution, rasterized sensor data (such as Lidar
data). As a rule of thumb, there
should be at least as many data cells as there are "no data" (NULL) cells in
the input raster map. Several interpolation modes are available. By
default, all values of the input raster map will be replaced with
locally interpolated values in the output raster map. This is the
equivalent of running a low-pass smoothing filter on the interpolated
data and is often desirable, owing to noisy nature of high-resolution
sensor data. With dense data and small gaps, this should result in only slight
deviations from the original data and produce smooth output. Alternatively,
setting the <b>-k</b> flag will disable the low-pass filter and preserve
the original cell values.

<p>

Available gap filling modes:
<ul>
<li>spatially weighted mean (default)</li>
<li>simple mean</li>
<li>simple median</li>
<li>simple mode</li>
</ul>

<p>

The spatially weighted mean is equivalent to an Inverse Distance
Weighting (IDW;
see also <em><a href="r.surf.idw.html">r.surf.idw</a></em>)
interpolation. The simple mean is equivalent to a simple low-pass filter.
Median and mode replacements can also be achieved using
<em><a href="r.neighbors.html">r.neighbors</a></em>.

<p>

Note that <em>r.fill.stats</em> is highly optimized for fast processing
of large raster datasets with a small interpolation distance threshold
(i.e. closing small gaps). As a trade-off for speed and a small memory
foot print, some spatial accuracy is lost due to the rounding of the
distance threshold to the closest approximation in input raster cells
and the use of a matrix of precomputed weights at cell resolution (see
further down for details). Note also that processing time will increase
exponentially with higher distance settings. Cells outside the distance
threshold will not be interpolated, preserving the edges of the original data
extent.

<p>

This module is not well suited for interpolating sparse input data and
closing large gaps. For such purposes more appropriate
methods are available, such as
<em><a href="v.surf.bspline.html">v.surf.bspline</a></em>,
<em><a href="v.surf.idw.html">v.surf.idw</a></em> or
<em><a href="v.surf.rst.html">v.surf.rst</a></em>.

<p>

Applications where the properties of <em>r.fill.stats</em> are
advantageous include the processing of high resolution, close range
scanning and remote sensing datasets. Such datasets commonly feature
densely spaced measurements that have some gaps after rasterization,
due to blind spots, instrument failures, and misalignments between the
GIS' raster cell grids and the original measurement locations. In these
cases, <em>r.fill.stats</em> should typically be run using the "weighted
mean" (default) method and with a small distance setting (the default
is to use a search radius of just three cells).

<p>

The images below show a gradiometer dataset with gaps and its
interpolated equivalent, produced using the spatially weighted mean
operator (<tt>mode="wmean"</tt>).

<p>

<img src="r_fill_stats_01.png" alt="Raw data">  <img src="r_fill_stats_02.png" alt="Gaps filled">

<p>

In addition, <em>r.fill.stats</em> can be useful for raster map
generalization. Often, this involves removing small clumps of
categorized cells and then filling the resulting data gaps without
"inventing" any new values. In such cases, the "mode" or "median"
interpolators should be used.


<h3>Usage</h3>

The most critical user-provided settings are the interpolation/gap
filling method (<b>mode</b>) and the maximum distance, up to which
<em>r.fill.stats</em> will scan for cells with values (<b>distance</b>).
The distance can be expressed as a number of cells (default) or in the
current GRASS location's units (if the <b>-m</b> flag is given). The latter
are typically meters, but can be any other units of a <em>planar</em>
coordinate system.

<p>

Note that proper handling of geodetic coordinates (lat/lon) and
distances is currently not implemented. For lat/lon data, the distance
should therefore be specified in cells and usage of
<em>r.fill.stats</em> should be restricted to small areas to avoid large
inaccuracies that may arise from the different extents of cells along
the latitudinal and longitudinal axes.

<p>

Distances specified in map units (<b>-m</b> flag) will be approximated
as accurately as the current region's cell resolution settings allow.
The program will warn if the distance cannot be expressed as whole
cells at the current region's resolution. In such case, the number of
cells in the search window will be rounded up. Due to the rounding
effect introduced by using cells as spatial units, the actual maximum
distance considered by the interpolation may be up to half a cell
diagonal larger than the one specified by the user.

<p>

The interpolator type "wmean" uses a precomputed matrix of spatial
weights to speed up computation. This matrix can be examined (printed
to the screen) before running the interpolation, by setting the
<b>-w</b> flag. In mode "wmean", the <b>power</b> option has the usual
meaning in IDW: higher values mean that cell values in the neighborhood
lose their influence on the cell to be interpolated more rapidly with
increasing distance from the latter. Another way of explaining this
effect is to state that larger "power" settings result in more
localized interpolation, smaller ones in more globalized interpolation.
The default setting is <tt>power=2.0</tt>.

<p>

The interpolators "mean", "median" and "mode" are calculated from all
cell values within the search radius. No spatial weighting is applied
for these methods. The "mode" of the input data may not always be
unique. In such case, the mode will be the smallest value with the
highest frequency.

<p>

Often, input data will contain spurious extreme measurements (spikes,
outliers, noise) caused by the limits of device sensitivity, hardware
defects, environmental influences, etc. If the normal, valid range of
input data is known beforehand, then the <b>minimum</b> and
<b>maximum</b> options can be used to exclude those input cells that
have values below or above that range, respectively. This will prevent
the influence of spikes and outliers from spreading through the
interpolation.

<p>

Unless the <b>-k</b> (keep) flag is given, data cells of the input
map will be replaced with interpolated values instead of preserving
them in the output. In modes "wmean" and "mean", this results in a
smoothing effect that includes the original data (see below)!

<p>

Besides the result of the interpolation/gap filling, a second output
map can be specified via the <b>uncertainty</b> option. The cell values
in this map represent a simple measure of how much uncertainty was
involved in interpolating each cell value of the primary output map,
with "0" being the lowest and "1" being the theoretic highest
uncertainty. Uncertainty is measured by summing up all cells in the
neighborhood (defined by the search radius <b>distance</b>) that
contain a value in the <b>input</b> map, multiplied by their weights,
and dividing the result by the sum of all weights in the neighborhood.
For <tt>mode=wmean</tt>, this means that interpolated output cells that
were computed from many nearby input cells have very low uncertainty
and vice versa. For all other modes, all weights in the neighborhood
are constant "1" and the uncertainty measure is a simple measure of how
many input data cells were present in the search window.

<h3>Smoothing</h3>

The <em>r.fill.stats</em> module uses the interpolated values to adjust
the original values and create a smooth surface, which is akin to running
a low-pass filter on the data. Since most high-resolution sensor data
is noisy, this is normally a desired effect and results in output that
is more suitable for deriving secondary products, such as slope, aspect
and curvature maps. Larger settings for the search radius (<b>distance</b>)
will result in a stronger smoothing. In practice, some experimentation
with different settings for <b>distance</b> and <b>power</b> might be required
to achieve good results. In some cases (e.g. when dealing with low-noise or
classified data), it might be desirable to turn off data smoothing by
setting the <b>-k</b> (keep) flag. This will ensure that the original
cell data is copied through to the result map without alteration. 

<center>
<a href="r_fill_stats_smoothing.png">
    <img src="r_fill_stats_smoothing.png" alt="Smooth versus preserve" width="600" height="300">
</a>
<p><em>
Effect of smoothing the original data: The top row shows a gap-filled surface computed from a rasterized Lidar point  
cloud (using <tt>mode=wmean</tt> and <tt>power=2</tt>), and the derived slope, aspect,
and profile curvature maps. The smoothing effect is clearly visible.
The bottom row shows the effect of setting the <b>-k</b> flag: Preserving the original
cell values in the interpolated output produces and unsmoothed, noisy surface, and likewise
noisy derivative maps.
</em></p>
</center>

The effect can be seen in the illustration above:
Slope, aspect, and profile curvature are computed using the
<em><a href="r.slope.aspect.html">r.slope.aspect</a></em> module, which
uses a window (kernel) for computations that considers only the
immediate neighborhood of each cell. When performed on noisy data,
such local operations result in equally noisy derivatives if the
original data is preserved (by setting the <b>-k</b> flag) and no smoothing
is performed.  

<p>

(Note that the effects of noisy data can also be avoided by using modules
that are not restricted to minimal kernel sizes. For example, aspect and other morphometric parameters can be computed
using the <em><a href="r.param.scale.html">r.param.scale</a></em> module
which operates with variable-size cell neighborhoods.)

<!--
wget http://fatra.cnr.ncsu.edu/uav-lidar-analytics-course/midpines_2015_spm.las -O points.las

g.region n=220220 s=218690 e=637800 w=636270 res=1

r.in.lidar input=points.las output=ground method=mean class_filter=2

r.fill.stats input=ground output=ground_filled mode=wmean power=2.0 cells=8
r.fill.stats -k input=ground output=ground_filled2 mode=wmean power=2.0 cells=8

r.slope.aspect elevation=ground_filled slope=slope aspect=aspect pcurvature=pcur tcurvature=tcur
r.slope.aspect elevation=ground_filled2 slope=slope2 aspect=aspect2 pcurvature=pcur2 tcurvature=tcur2

g.region n=219537 s=219172 w=636852 e=637218 -k

d.mon cairo output=r_fill_gaps_smoothing.png width=1464 height=730
export GRASS_FONT=LiberationSans-Regular
d.frame frame=u1 at=50,100,0,25 -c
d.rast ground_filled
d.text text="surf. (smoothed)" at=5,5 size=10 bgcolor=white color=black
d.frame frame=u2 at=50,100,25,50 -c
d.rast slope
d.text text="slope" at=5,5 size=10 bgcolor=white color=black
d.frame frame=u3 at=50,100,50,75 -c
d.rast aspect
d.text text="aspect" at=5,5 size=10 bgcolor=white color=black
d.frame frame=u4 at=50,100,75,100 -c
d.rast pcur
d.text text="prof. curvature" at=5,5 size=10 bgcolor=white color=black
d.frame frame=l1  at=0,50,0,25 -c
d.rast ground_filled2
d.text text="surf. (preserved)" at=5,5 size=10 bgcolor=white color=black
d.frame frame=l2 at=0,50,25,50 -c
d.rast slope2
d.text text="slope" at=5,5 size=10 bgcolor=white color=black
d.frame frame=l3 at=0,50,50,75 -c
d.rast aspect2
d.text text="aspect" at=5,5 size=10 bgcolor=white color=black
d.frame frame=l4 at=0,50,75,100 -c
d.rast pcur2
d.text text="prof. curvature" at=5,5 size=10 bgcolor=white color=black
d.mon stop=cairo
-->

<h3>Spatial weighting scheme</h3>

The key to getting good gap filling results is to understand the
spatial weighting scheme used in mode "wmean". The weights are
precomputed and assigned per cell within the search window centered on
the location at which to interpolate/gap fill all cells within the
user-specified distance.

<p>

The illustration below shows the precomputed weights matrix for a
search distance of four cells from the center cell:

<pre>
000.00 000.01 000.04 000.07 000.09 000.07 000.04 000.01 000.00
000.01 000.06 000.13 000.19 000.22 000.19 000.13 000.06 000.01
000.04 000.13 000.25 000.37 000.42 000.37 000.25 000.13 000.04
000.07 000.19 000.37 000.56 000.68 000.56 000.37 000.19 000.07
000.09 000.22 000.42 000.68 001.00 000.68 000.42 000.22 000.09
000.07 000.19 000.37 000.56 000.68 000.56 000.37 000.19 000.07
000.04 000.13 000.25 000.37 000.42 000.37 000.25 000.13 000.04
000.01 000.06 000.13 000.19 000.22 000.19 000.13 000.06 000.01
000.00 000.01 000.04 000.07 000.09 000.07 000.04 000.01 000.00
</pre>

Note that the weights in such a small window drop rapidly for the
default setting of <tt>power=2</tt>.

<p>

If the distance is given in map units (flag <tt>-m</tt>), then the
search window can be modeled more accurately as a circle. The
illustration below shows the precomputed weights for a distance in map
units that is approximately equivalent to four cells from the center
cell:

<pre>
...... ...... ...... 000.00 000.00 000.00 ...... ...... ......
...... 000.00 000.02 000.06 000.09 000.06 000.02 000.00 ......
...... 000.02 000.11 000.22 000.28 000.22 000.11 000.02 ......
000.00 000.07 000.22 000.44 000.58 000.44 000.22 000.07 000.00
000.00 000.09 000.28 000.58 001.00 000.58 000.28 000.09 000.00
000.00 000.07 000.22 000.44 000.58 000.44 000.22 000.07 000.00
...... 000.02 000.11 000.22 000.28 000.22 000.11 000.02 ......
...... 000.00 000.02 000.06 000.09 000.06 000.02 000.00 ......
...... ...... ...... 000.00 000.00 000.00 ...... ...... ......
</pre>

<p>

When using a small search radius, <b>cells</b> must also be set to a
small value. Otherwise, there may not be enough cells with data within
the search radius to support interpolation.


<h2>NOTES</h2>

The straight-line metric used for converting distances in map units to
cell numbers is only adequate for planar coordinate systems. Using this
module with lat/lon input data will likely give inaccurate results,
especially when interpolating over large geographical areas.

<p>

If the distance is set to a relatively large value, processing time
will quickly approach and eventually exceed that of point-based
interpolation modules such as
<em><a href="v.surf.rst.html">v.surf.rst</a></em>.

<p>

This module can handle cells with different X and Y resolutions.
However, note that the weight matrix will be skewed in such cases, with
higher weights occurring close to the center and along the axis with
the higher resolution. This is because weights on the lower resolution
axis are less accurately calculated. The skewing effect will be
stronger if the difference between the X and Y axis resolution is
greater and a larger "power" setting is chosen. This property of the
weights matrix directly reflects the higher information density along
the higher resolution axis.

<p>

Note on printing the weights matrix (using the <b>-w</b> flag): the
matrix cannot be printed if it is too large.

<p>

The memory estimate provided by the <b>-u</b> flag is a lower limit on
the amount of RAM that will be needed.

<p>

If the <b>-s</b> flag is set, floating point type output will be
saved as a "single precision" raster map, saving ~50% disk space
compared to the default "double precision" output.


<h2>EXAMPLES</h2>

<h3>Gap-filling of a dataset using spatially weighted mean (IDW)</h3>

Gap-fill a dataset using spatially weighted mean (IDW) and a maximum
search radius of 3.0 map units; also produce uncertainty estimation
map:

<div class="code"><pre>
r.fill.stats input=measurements output=result dist=3.0 -m mode=wmean uncertainty=uncert_map
</pre></div>

Run a fast low-pass filter (replacement all cells with mean value of
neighboring cells) on the input map:

<div class="code"><pre>
r.fill.stats input=measurements output=result dist=10 mode=mean
</pre></div>

Fill data gaps in a categorized raster map; preserve existing data:

<div class="code"><pre>
r.fill.stats input=categories output=result dist=100 -m mode=mode -k
</pre></div>

<h3>Lidar point cloud example</h3>

<!--
Using:
http://fatra.cnr.ncsu.edu/uav-lidar-analytics-course/midpines_2015_spm.las
-->

Inspect the point density and determine the extent of the point cloud
using the <em><a href="r.in.lidar.html">r.in.lidar</a></em> module:

<div class="code"><pre>
r.in.lidar -e input=points.las output=density method=n resolution=5 class_filter=2
</pre></div>

Based on the result, set computational region extent and desired
resolution:

<div class="code"><pre>
g.region -pa raster=density res=1
</pre></div>

Import the point cloud as raster using binning:

<div class="code"><pre>
r.in.lidar input=points.las output=ground_raw method=mean class_filter=2
</pre></div>

Check that there are more non-NULL cells than NULL ("no data") cells:

<div class="code"><pre>
r.univar map=ground_raw
</pre></div>

<pre>
total null and non-null cells: 2340900
total null cells: 639184
...
</pre>

Fill in the NULL cells using the default 3-cell search radius:

<div class="code"><pre>
r.fill.stats input=ground output=ground_filled uncertainty=uncertainty distance=3 mode=wmean power=2.0 cells=8
</pre></div>

<center>
<a href="r_fill_stats_lidar.png"><img src="r_fill_stats_lidar.png" alt="Point density and ground surface" width=600 height=600></a>
<p><em>
Binning of Lidar and resulting ground surface with filled gaps.
Note the remaining NULL cells (white) in the resulting ground surface.
These are areas with a lack of cells with values in close proximity.
</em></p>
</center>

<!--
d.mon cairo output=r_fill_gaps_lidar.png width=1530 height=1530
export GRASS_FONT=LiberationSans-Regular
d.frame frame=ul at=50,100,0,50 -c
d.rast density3
d.legend raster=density3 title="Density" at=50,95,2,10 -bsf
d.text text="density" at=5,5 size=10 bgcolor=white color=black
d.frame frame=ur at=50,100,50,100 -c
d.rast ground
d.text text="ground_raw" at=5,5 size=10 bgcolor=white color=black
d.legend raster=ground title="Ground (raw)" at=50,95,2,10 -bsf
d.frame frame=ll  at=0,50,0,50 -c
d.rast uncertainty
d.text text="uncertainty" at=5,5 size=10 bgcolor=white color=black
d.legend raster=uncertainty title="Uncertainty" at=50,95,2,10 -bsf
d.frame frame=lr at=0,50,50,100 -c
d.rast ground_filled
d.text text="ground_filled" at=5,5 size=10 bgcolor=white color=black
d.legend raster=ground title="Ground (filled)" at=50,95,2,10 -bsf
d.mon stop=cairo
-->

<h3>Outlier removal and gap-filling of SRTM elevation data</h3>

In this example, the SRTM elevation map in the
North Carolina sample dataset location is filtered for outlier
elevation values; missing pixels are then re-interpolated to obtain
a complete elevation map:

<div class="code"><pre>
g.region raster=elev_srtm_30m -p
d.mon wx0
d.histogram elev_srtm_30m

# remove SRTM outliers, i.e. SRTM below 50m (esp. lakes), leading to no data areas
r.mapcalc "elev_srtm_30m_filt = if(elev_srtm_30m < 50.0, null(), elev_srtm_30m)"
d.histogram elev_srtm_30m_filt
d.rast elev_srtm_30m_filt

# using the IDW method to fill these holes in DEM without low-pass filter
# increase distance to gap-fill larger holes
r.fill.stats -k input=elev_srtm_30m_filt output=elev_srtm_30m_idw distance=100

d.histogram elev_srtm_30m_idw
d.rast elev_srtm_30m_idw

r.mapcalc "diff_orig_idw = elev_srtm_30m - elev_srtm_30m_idw"
r.colors diff_orig_idw color=differences

r.univar -e diff_orig_idw
d.erase
d.rast diff_orig_idw
d.legend diff_orig_idw
</pre></div>

<h2>SEE ALSO</h2>

<em>
<a href="r.fillnulls.html">r.fillnulls</a>,
<a href="r.neighbors.html">r.neighbors</a>,
<a href="r.surf.idw.html">r.surf.idw</a>,
<a href="v.surf.bspline.html">v.surf.bspline</a>,
<a href="v.surf.idw.html">v.surf.idw</a>,
<a href="v.surf.rst.html">v.surf.rst</a>
</em>

<p>
<a href="http://en.wikipedia.org/wiki/Inverse_distance_weighting">Inverse Distance Weighting in Wikipedia</a>

<h2>AUTHOR</h2>

Benjamin Ducke

<!--
<p>
<i>Last changed: $Date$</i>
-->

